INVITED SPEAKER

Assoc. Prof. Muhammad Tariq Mahmood
Korea University of Technology and Education, South Korea

BIO: He received the MCS degree in Computer Science from AJK University, Muzaffarabad, Pakistan, in 2004, the MS degree in Intelligent Software Systems from Blekinge Institute of Technology, Sweden, in 2006, and the PhD degree in Information and Mechatronics from Gwangju Institute of Science and Technology (GIST), Korea, in 2011. In the early stages of his career, he worked as a Software Engineer for over eight years at Khaksar and Co., Islamabad, Pakistan. He has been involved in several research projects funded by the National Research Foundation (NRF) of Korea, focusing on areas such as shape-from-focus/defocus, smart cities, and underwater imaging. He has authored more than 100 research articles published in reputable journals and international conferences. He is currently serving as an Associate Professor in the School of Computer Science and Engineering at Korea University of Technology and Education (KOREATECH), Cheonan, Korea. His research interests include image processing, 3D shape recovery from image focus, computer vision, pattern recognition, and machine/deep learning.

Title of Speech: Iterative Deep Shape-from-Focus: A Recurrent Framework for Depth Estimation

Abstract: Depth estimation is a fundamental task in computer vision, enabling machines to infer the 3D structure of a scene from 2D images. Among passive optical methods, Shape-from-Focus (SFF) is particularly advantageous due to its simple setup, minimal hardware requirements, and ease of implementation. In SFF systems, a series of images, termed a focal stack, is captured by a single camera at varying focus settings. A quantified focus score or feature is computed for each pixel across the focal stack, forming a 3D focus volume. A dense 2D depth map is then estimated by identifying the image index in the focus volume where the focus score is maximized. In SFF, the focus volume (FV) plays a crucial role in inferring accurate depth. However, conventional approaches often regress the 3D focus volume directly into a 2D depth map, which can amplify errors and lead to a loss of spatial context. To address this, we propose a deep learning-based method that treats depth estimation as an iterative optimization problem. The process begins with an encoder backbone that extracts multi-scale features from each focal slice, while a separate context encoder processes a reference image to capture global scene cues. A focus mapping module then integrates these multi-scale features into single-channel focus volumes that capture per-pixel focus likelihood across the focal stack. These focus volumes are passed to a recurrent depth extraction module, composed of multiple Gated Recurrent Unit (GRU) layers operating at different resolutions. This module iteratively refines the depth prediction, with joint supervision applied to all intermediate outputs to improve accuracy. Experiments on both synthetic and real world datasets demonstrate that our framework produces accurate depth maps, and generalizes better to unseen real-world scenarios than existing state-of-the-art methods.